Fix inlining behaviour at the NVVM IR level#246
Conversation
PR NVIDIA#181 aimed to align the behaviour of the `inline` kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level. Because the default value of `inline` is `"never"`, this was interpreted by the `compile_cuda()` function as a `True`ish value and every device function got marked with the `alwaysinline` function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in `nvvmCompileProgram()`. To rectify these issues, we add the `forceinline` kwarg to the `@cuda.jit` decorator and the `cuda.compile[_*]()` functions. Now, `compile_cuda()` will only enable inlining at the NVVM IR level for `forceinline` and not `inline`. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the `inline` and `forceinline` kwargs to clarify the intent and behaviour for users. For clarity: the behaviour is now: - The `inline` kwarg enables inlining only at the Numba IR level. - The `forceinline` kwarg enables inlining only at the NVVM IR level.
- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)
| self.argtypes, | ||
| debug=self.debug, | ||
| lineinfo=lineinfo, | ||
| inline=inline, |
There was a problem hiding this comment.
Just double checking, since it is happening on LLVM IR level there is no need to pass inline, because it affects only Numba IR?
There was a problem hiding this comment.
Yeah, exactly - inline affects online Numba IR, forceinline affects only LLVM IR (which is how it should be when all is working correctly)
| ``"always"``. See `Notes on Inlining | ||
| <https://numba.readthedocs.io/en/stable/developer/inlining.html>`_. | ||
| :type inline: str | ||
| :param forceinline: Enables inlining at the NVVM IR level when set to |
There was a problem hiding this comment.
Do we want it to be bool, or we want to expose llvm IR attributes directly:
https://llvm.org/docs/LangRef.html#function-attributes
There was a problem hiding this comment.
I want it to be bool so it's consistent with Numba. There is a plan for Numba to expose LLVM attributes directly, so I'll aim to align with that in future.
ZzEeKkAa
left a comment
There was a problem hiding this comment.
LGTM, few comments!
Thank you for fixing it. Did not mean to break inlining with the original MR. There was luck of docs that explains that behavior...
Not only that, but I didn't even know |
|
@ZzEeKkAa Many thanks for the review - I've responded to the comments - let me know if I should follow up any further! |
|
I'll merge this a little later today, as it seems to have been the resolution to RAPIDS hanging (in rapidsai/cudf#18688) in the v0.10.1 release - assuming there are no objections? |
PR #181 aimed to align the behaviour of the
inlinekwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level.Because the default value of
inlineis"never", this was interpreted by thecompile_cuda()function as aTrueish value and every device function got marked with thealwaysinlinefunction attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang innvvmCompileProgram().To rectify these issues, we add the
forceinlinekwarg to the@cuda.jitdecorator and thecuda.compile[_*]()functions. Now,compile_cuda()will only enable inlining at the NVVM IR level forforceinlineand notinline. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document theinlineandforceinlinekwargs to clarify the intent and behaviour for users.For clarity: the behaviour is now:
inlinekwarg enables inlining only at the Numba IR level.forceinlinekwarg enables inlining only at the NVVM IR level.